HEAD
=======
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
Drawing graphs
<<<<<<< HEAD
=======
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
<<<<<<< HEAD
=======
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
Drawing graphs
Our data
To illustrate making graphs, we need some data.
Data on 202 male and female athletes at the Australian Institute of Sport.
Variables:
categorical: Sex of athlete, sport they play
quantitative: height (cm), weight (kg), lean body mass, red and white blood cell counts, haematocrit and haemoglobin (blood), ferritin concentration, body mass index, percent body fat.
Values separated by tabs (which impacts reading in).
Packages for this section
<<<<<<< HEAD
library(tidyverse)
=======
library(tidyverse)
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
Reading data into R
Use read_tsv (“tab-separated values”), like read_csv.
The distribution of BMI for females is closer to normal, with only the highest few values being too high
The distribution of BMI values for males might even be right-skewed: not only are the upper values too high, but some of the lowest ones are not low enough.
More normal quantile plots
How straight does a normal quantile plot have to be?
There is randomness in real data, so even a normal quantile plot from normal data won’t look perfectly straight.
With a small sample, can look not very straight even from normal data.
Looking for systematic departure from a straight line; random wiggles ought not to concern us.
Look at some examples where we know the answer, so that we can see what to expect.
Normal data, large sample
<<<<<<< HEAD
d <-tibble(x=rnorm(200))ggplot(d, aes(x=x)) +geom_histogram(bins=10)
=======
d <-tibble(x=rnorm(200))ggplot(d, aes(x=x)) +geom_histogram(bins=10)
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
The normal quantile plot
<<<<<<< HEAD
ggplot(d,aes(sample=x))+stat_qq()+stat_qq_line()
=======
ggplot(d,aes(sample=x))+stat_qq()+stat_qq_line()
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
Normal data, small sample
Not so convincingly normal, but not obviously skewed:
<<<<<<< HEAD
d <-tibble(x=rnorm(20))ggplot(d, aes(x=x)) +geom_histogram(bins=5)
=======
d <-tibble(x=rnorm(20))ggplot(d, aes(x=x)) +geom_histogram(bins=5)
>>>>>>> 1b9bd782f66c30e0c75454760e7e9aebd48337ec
The normal quantile plot
Good, apart from the highest and lowest points being slightly off. I’d call this good:
Comments